Methods and apparatus to measure market statistics

ABSTRACT

Example methods and apparatus to measure market statistics are disclosed. A disclosed example method includes identifying, by executing an instruction with a processor, a household including self reported data stored in a data source, retrieving, by executing an instruction with the processor, an Internet protocol (IP) address from the household, identifying, by executing an instruction with the processor, a hostname label based on a reverse resolve of the IP address, determining whether the hostname label matches the self reported data, with the processor, and when the hostname label matches the self reported data, correcting, by executing an instruction with the processor, an error metric of the data source by adjusting a weight value associated with the self reported data.

RELATED APPLICATION

This patent arises from U.S. patent application Ser. No. 12/695,793,filed on Jan. 28, 2010, entitled “Methods and Apparatus to MeasureMarket Statistics,” granted as U.S. Pat. No. 9,129,293 on Sep. 8, 2015,which claims the benefit of U.S. Provisional application Ser. No.61/148,251, filed on Jan. 29, 2009, which are hereby incorporated byreference herein in their entireties.

FIELD OF THE DISCLOSURE

This disclosure relates generally to market research and, moreparticularly, to methods and apparatus to measure market statistics.

BACKGROUND

Service providers, such as media service providers and/or Internetservice providers (ISPs) that choose to participate in a markettypically need to acquire information about their competitors.Competitive information allows the provider to employ strategic and/ortactical decisions related to opportunities that may increase asubscriber base and/or identify which market areas may be particularlyreceptive to the services provided by the provider. Additionally,information about the provider and its competitors permits a comparisonto reveal market presence and/or market dominance.

Obtaining information related to the presence of competitive providersand/or the market share in any particular geographic market may entailconducting surveys. Surveys, whether oral or written, typically yieldlow sample rates when compared to the total number of existingsubscribers. Additionally, answers to the surveys are usually providedby a human respondent, who is prone to inaccuracy regarding details oftheir existing provider. For example, a human respondent may state thename of their browser application or computer manufacturer instead ofthe name of their provider.

Additionally, because oral and written surveys are perceived as a burdento subscribers, providers are not likely to enjoy opportunities todetermine whether the subscriber's status has changed. For example, asubscriber to a provider is not typically bound by contracts thatrestrict and/or discourage competitive shopping with alternateproviders. Thus, if the subscriber agrees to answer survey questions ata first time, such subscriber is not likely to also agree to anothersurvey question at a second time (e.g., two-months after the firstsurvey). Instead, the subscriber is likely to view the additional surveyquestions as a burden not worthy of their time.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example system to measure marketstatistics.

FIG. 2 is a block diagram of an example market share evaluator shown inFIG. 1.

FIGS. 3-8, and 11-13 are flowcharts representative of examples processesthat may be performed to implement one or more entities of the examplesystems and apparatus of FIGS. 1, 2, and 10, and/or the example datacharts of FIGS. 9A and 9B.

FIGS. 9A and 9B are example data charts generated by the example systemsand apparatus of FIGS. 1, 2, and 10 and/or the example processes ofFIGS. 3-8 and 11-13.

FIG. 10 is a block diagram of another example system to measure marketstatistics.

FIG. 14 is a block diagram of an example processor system that may beused to execute the example processes of FIGS. 3-8 and 11-13 toimplement the example systems, apparatus, and/or methods describedherein.

DETAILED DESCRIPTION

Although the following discloses example methods and apparatusincluding, among other components, software executed on hardware, suchmethods and apparatus are merely illustrative and should not beconsidered as limiting. For example, any or all of these hardware andsoftware components could be embodied exclusively in hardware,exclusively in software, or in any combination of hardware and software.Accordingly, while the following describes example methods, systems, andapparatus, the examples provided are not the only way to implement suchmethods, systems, and apparatus.

Example methods and apparatus to measure market statistics aredisclosed. A disclosed example method includes retrieving a servicepenetration value from a managed data source, retrieving an Internetprotocol (IP) address from a convenience data source, and identifying ahousehold identification number associated with the retrieved IPaddress. The example method also includes retrieving household dataassociated with the household identification number, verifying a serviceprovider name provided by the convenience data source, and adjusting aweight value of the retrieved household data when the service providername provided by the convenience data source is different than a currentservice provider name. Additionally, the example method includescalculating service provider flow share based on the adjusted weightvalue and the service penetration value, and generating a reportincluding the calculated service provider flow share.

A disclosed example apparatus includes a registry manager to retrieve aservice penetration value from a managed data source, a data sourcelocator to identify a convenience data source associated with ahousehold identification number and obtain an Internet protocol (IP)address associated with the household identification number, and anetwork interface to retrieve data from the convenience data sourceassociated with the household identification number. The exampleapparatus also includes a data source combiner to generate a verifiedservice provider name based on a service provider name retrieved fromthe convenience data source and associated with the IP address, a datasource weight calculator to adjust a weight value of the retrieved datafrom the convenience data source when the service provider nameretrieved from the convenience data source is different than theverified service provider name, and a test manager to calculate serviceprovider flow share based on the adjusted weight value from theconvenience data source and a calculated market share based on theservice penetration value.

While examples below describe Internet service providers (ISPs), suchexamples are provided for convenience and not to be construed as limitedto ISPs and the methods and apparatus described herein may be applied toany other type of provider. In the event that a provider of services,such as a media provider (e.g., a cable television provider, a satelliteprovider) and/or an ISP gains or loses subscribers, the provider isparticularly interested in learning whether competitors experience asimilar gain or loss. For example, if the ISP and one or morecompetitors experience similar gains in subscribers, then the generalmarket area may be experiencing growth. Such an indication may promptthe ISP to increase advertising and/or promotional resources to attemptto capture such growth (e.g., market share) opportunities before one ormore competitors capture a greater portion of the available subscribers.Alternatively, if the ISP and one or more competitors experiencesubscriber losses, then the general market area may be stagnant and/orshrinking, which may indicate advertising and/or promotional effortsshould focus on one or more alternate markets. On the other hand, in theevent that the ISP market share is higher than one or more competitorsand/or rate of new subscribers is increasing as compared to one or morecompetitors, then the ISP may gain insight to the effectiveness of anadvertising campaign and/or a promotion.

The methods and apparatus described herein identify, in part, candidateinternet protocol (IP) addresses that may be associated withsubscribers, whether the IP addresses are active or inactive, the ISPassociated with each IP address, and/or the location of the IP address.As described in further detail below, the methods and apparatusdescribed herein allow a user to collect broadband market data for oneor more market(s) of interest and/or from one or more data sources ofinterest. As such, the user is better able to calculate market share forone or more broadband ISPs because, in part, market statisticcalculation confidence increases when a statistically significant amountof representative data is available. An example market share database isupdated on a scheduled basis, a manual basis, periodically, oraperiodically to store known ISPs, name servers, and/or routernomenclature (e.g., ge-0-1-ubr01.warren.ma.boston.comcast.net)associated with ISPs and/or IP addresses. In particular, the examplemethods and apparatus described herein identify one or more key words,letters, numbers, labels, and/or combinations thereof that may beindicative of geography (e.g., “ma,” “boston,” etc.) and/or one or moremarkets of interest.

FIG. 1 is an illustration of an example system 100 to measure broadbandmarket statistics. In the illustrated example of FIG. 1, the systemincludes a market share evaluator 102 communicatively connected to aregistry database 104 and a market information database 106. The exampleregistry database 104 may include one or more databases that identifywhich registrant is associated with one or more registered IP addressesand/or domain names. Such registry databases 104 include, but are notlimited to, information from the American Registry for Internet Numbers(ARIN), the Internet Corporation for Assigned Names and Number (ICANN),IP Geolocation by Maxmind , etc. Additionally, the example marketinformation database 106 includes resolution keywords in an exampleresolution keyword table 107 to, in part, allow resolution of one ormore hostname labels that may be cryptic and/or otherwise unfamiliar toa user of the methods and apparatus described herein. For example, inresponse to an example Whois query to ARIN, one or more labels may bereturned having no apparent relationship to a state, a city, and/or aknown ISP name. Accordingly, the example resolution keywords 107 may beused to decipher such labels.

The example market share evaluator 102 is also communicatively connectedto one or more networks 108, such as the Internet. Any number of ISPs,such as example ISP “A” 110, ISP “B” 112, and ISP “C” arecommunicatively connected to the network 108, in which each ISP providesone or more services to household communication devices 116 a-d.Household communication devices 116 a-d may include, but are not limitedto routers, modems and/or other equipment communicatively connected toan ISP. Each example ISP may be responsible for providing services(e.g., such as household Internet access, telephony, and/or mediaservices) in an identified geographic market area. In the illustratedexample of FIG. 1, ISP “A” 110 provides services to one or morehousehold communication devices 116 a, and ISP “B” 112 provides servicesto one or more household communication devices 116 b, both of which arelocated in an example first market area 118. On the other hand, exampleISP “C” 114 provides services to one or more household communicationdevices 116 c in an example second market area 120.

In the illustrated example of FIG. 1, ISP “A” 110 provides services toone or more household communication devices in both the first marketarea 118 and the second market area 120. As described in further detailbelow, each household communication device is assigned an IP address(e.g., a static IP address or a dynamic IP address) by the ISP that isregistered for use and/or management by that ISP. In the event that theISP of interest has a presence in more than one market area (e.g., onlythe first market area 118, only the second market area 120, etc.), thenthe methods and apparatus described herein perform one or moretechniques to ascertain the most likely geographic indicator (e.g., astate, a city, etc.) for the IP address assigned to the householdcommunication device(s). For example, if one of the example householdcommunication devices 116 a has an IP address of 208.180.36.166, and oneof the example household communication devices 116 d has an IP addressof 208.180.250.75, both of which are registered by ISP “A” 110, then themethods and apparatus described herein associate each IP address with acorresponding ISP router that provides geographic location cues. Inparticular, the methods and apparatus described herein may identify thatthe last router hop in the communicative path to IP address to208.180.36.166 is associated with hostnames208-180-36-166.snjs.ca.sta.suddenlink.net,” which includes one or moregeographic location cues (e.g., “snjs” refers to San Jose).

Without limitation, the methods and apparatus described herein test oneor more IP addresses to determine whether the IP address of interest isactive or inactive. Generally speaking, an ISP registers one or moreblocks of IP addresses that may be assigned to subscribers. ARIN is oneof five Regional Internet Registries (RIRs) that manage IP addressresources. Other RIRs include the Asia Pacific Network InformationCentre (APNIC) and the Latin American and Caribbean IP Address RegionalRegistry (LACNIC). ARIN facilitates one or more processes that allow auser (e.g., an ISP) to register (usually for a fee) one or more IPaddresses for exclusive use and to associate the registered IP addresswith a hostname. ARIN also facilitates a database lookup facility toallow queries (e.g., Whois queries) of an IP address to return acorresponding hostname, organization information, and/or whether theorganization has other registered IP addresses (e.g., a block of IPaddresses). However, while the organization information may include acorresponding contact name and/or address, such organization informationmay not be indicative of where the registered IP address(es) are beingused (located). To illustrate, the IP address 68.87.148.146 isassociated with, at the time of this writing, Comcast CableCommunications, Inc. having an organization address in Mt. Laurel, N.J.The ARIN database also indicates that this organization has registered ablock of IP addresses ranging from 68.80.0.0 to 68.87.255.255. However,a User Datagram Protocol (UDP) and/or an Internet Control MessageProtocol (ICMP) trace-route of the IP address 68.87.148.146 reveals thatthe last router hostname is “ge-0-1-ubr01.warren.ma.boston.comcast.net.”As described above, the labels within the example hostname provide atleast two geographic cues (i.e., “ma,” and “boston”) of where the IPaddress of interest is likely being used. In this example, the mostlikely geographic location associated with the IP address 68.87.148.146is Boston, Massachusetts.

While the concept of a hostname is to, in part, facilitate ahuman-readable association with a specific IP address, the hostnameand/or one or more labels of the hostname concatenated with dots maystill appear cryptic to a human attempting to read it. For example,labels within the example hostname “snjs.ca.sta.suddenlink.net” may notreadily appear to reveal useful information to persons unfamiliar withthe San Jose area. Alternatively, users may confuse the label “ca” withnomenclature associated with Canada rather than California. To betterresolve hostname labels and determine geographic and/or organizationalassociation(s), the methods and apparatus described herein parse thehostname labels and search the example market information database 106for matching (e.g., logical ANDed) combinations. In the event that amatch of two or more labels is found after a logical AND, a candidategeographic location and/or associated organization associated with theIP address may be determined. To illustrate with the above-identifiedexample hostname “snjs.ca.sta.suddenlink.net,” the methods and apparatusdescribed herein search for a combination of labels “snjs,” “ca,” and“suddenlink” before concluding that the router hostname is properlyassociated with the organization Suddenlink Communications, Inc. locatedin San Jose, Calif.

Returning to the illustrated example of FIG. 1, one or more householdcommunication devices may be associated with one or more panelists, suchas panelist router “A” 122 a, panelist router “B” 122 b, and panelistrouter “C” 122 c. One or more panelist households may be selected by,for example, a marketing entity to obtain a statistically significantsample size of behavior so that projections to a larger universe may bemade with an acceptable degree of confidence. Panelists are typicallyselected in sufficient numbers to achieve statistical significance, andrepresent one or more particular geographic and/or demographic aspectsof a larger universe of participants (e.g., subscribers of Internetservices, viewers of media devices, Hispanic consumers, Japaneseconsumers, etc.). One or more households typically become a panelistbased on, in part, an agreement with the monitoring entity (e.g., themarketing entity) to have one or more household behaviors monitored.Household behaviors of interest to the monitoring entity include, butare not limited to, television programs watched, dates/times at whichprograms are watched, shopping behaviors, web sites visited, and/orhousehold services purchased (e.g., Internet services, telephonyservices, etc.). The agreement with the household panelists alsoincludes explicit information related to the household address, therebyrevealing their location. In the illustrated example of FIG. 1, thepanelists 122 a-c also provide information related to the IP addressassigned to their router(s) and the ISP with which the householdsubscribes.

As described in further detail below, the example market share evaluator102 periodically, aperiodically, manually, or on a scheduled basisretrieves updates from the panelist households 122 a-c to obtain themost recent IP address and ISP used by that household. For example, eachpanelist household may have one or more personal computers running oneor more applications that determine the assigned router IP address ande-mail such IP address information back to the marketing entity. In theevent that a household decides to change which ISP supplies Internetservices, the example market share evaluator 102 updates the examplemarket information database 106 with information related to the newand/or alternate ISP. In this manner, the system 100 may stay appraisedof additional ISPs that enter the first market area 118 and/or thesecond market area 120. Additionally, the example market share evaluator102 identifies one or more new and/or alternate hostname labelsindicative of the new and/or alternate ISP. As a result, hostnameresolution related to IP address location and its associatedorganization (e.g., its associated ISP) can be performed using theproper label(s).

FIG. 2 is a detailed schematic illustration of the example market shareevaluator 102 shown in FIG. 1. The example market share evaluator 102includes a test manager 202, an IP address aggregator 204, and anactivity determiner 205 that includes an example ping manager 206, and aport scan manager 208. The example market share evaluator 102 alsoincludes a registry manager 210 that is communicatively connected to theexample registry database 104. Further, the example market shareevaluator 102 includes a hostname resolver 212, a network interface 214communicatively connected to the example network 108 (e.g., theInternet), a panelist updater 216, a data source locator 218, a datasource weight calculator 220, and a data source combiner 222. Theexample test manager 202 is communicatively connected to the marketinformation database 106 the IP address aggregator 204, the ping manager206, the port scan manager 208, the registry manager 210, the hostnameresolver 212, the network interface 214, the panelist updater 216, thedata source locator 218, the data source weight calculator 220, and thedata source combiner 222 to invoke one or more services thereof. Theexample test manager 202, on a periodic, scheduled, aperiodic, and/ormanual basis invokes a test of one or more IP addresses to determinewhether it is active or inactive. Additionally, the example test manager202 determines a corresponding organization and location associated withthe IP address. Once information related to an IP addressactive/inactive status, an organization, and a location is obtained, theexample test manager 202 calculates market share statistics for eachorganization that provides Internet services to subscribers.

In operation, the example test manager 202 invokes the example IPaddress aggregator 204 to obtain one or more IP addresses of interestwhen performing a test. For example, the IP address aggregator 204 mayprovide one or more IP addresses of interest from the market informationdatabase 106, which may store IP addresses identified from one or moreprevious tests to determine an active/inactive state. As such, the datawithin the example market information database 106 may be kept current.Without limitation, the example IP address aggregator 204 selects one ormore IP addresses of one or more panelist households 122 a-c.Additionally, the IP address aggregator 204 may select one or moreneighboring IP addresses that could be part of a block of IP addressesregistered by the ISP associated with the panelist. The IP addressaggregator 204 may also invoke the registry manager 210 to obtain one ormore IP addresses associated with a specific organization/ISP. Forexample, the IP address aggregator 204 may specify an organization name“WOW MEDIA” to determine which IP addresses and/or blocks of IPaddresses are registered by the organization/ISP of interest. Inoperation, the example registry manager 210 contacts at least oneregistry database, such as ARIN, and provides organizational namekeywords as input to the registry database. In response to the query,the example registry manager 210 receives corresponding IP addresses, aseed starting IP address, and/or blocks of IP addresses associated withthe keyword input(s). Rather than test all of the returned IP addressesidentified from the registry database query, the example test manager202 may randomly select a predetermined number of IP addresses from theblock (e.g., a subset of the block) to test. As such, a representativerandom sample of IP addresses may provide a reasonable indication ofmarket presence of the organization. Additionally or alternatively, theexample test manager 202 may employ the seed starting IP address as arandomly selected IP address within the range associated with theorganization. The randomly selected seed starting IP address may serviceas a starting IP address, an ending IP address, and/or a midpoint IPaddress within a range of IP addresses to test.

To determine whether an IP address of interest is active, the exampleping manager 206 performs an ICMP Ping operation via the example networkinterface 214 using the IP address of interest. Generally speaking, aping is a computer network tool employed to determine whether a host isreachable across a network and sends ICMP echo request packets to thetarget IP address of interest. After sending each echo request packet,the example ping manager 206 listens via the network interface 214 forICMP echo response reply messages. If such reply messages are received,then the IP address of interest is deemed active. In some instances, anISP and/or one or more routers in the path may block a ping request(e.g., for security concerns), at which point the example port scanmanager 208 attempts to scan one or more ports (e.g., TCP port 80 httpservice, TCP port 443 https service, etc.) of the machine associatedwith IP address of interest. A successful port scan results in the IPaddress of interest being deemed active. Generally speaking, some ISPsmay block either or both of a port scan or ping. If neither the ping northe port scan are successful, the IP address of interest is deemedinactive and the market information database 106 is updated accordingly.

IP addresses of interest having a successful ping or port scan arefurther associated with information obtained from one or more registrydatabases 104, such as the ARIN database that stores registryinformation for IP addresses in North America, Canada, the Caribbean,and the North Atlantic Islands. A query to the ARIN database is alsoreferred to as a Whois query, which accepts the IP address or hostnameas input and returns information including, but not limited to, anassociated organization, a contact person(s), contact telephonenumber(s), a contact address, and/or contact e-mail address. Theorganizational information returned from the ARIN query is compared tothe market information database 106 for a match that indicates whetherthe organization is a corporation, a university, or an ISP. If the IPaddress of interest is associated with an ISP, then it is also deemed tobe used for residential purposes.

The methods and apparatus described herein further determine acorresponding location and more detailed ownership information. In otherwords, while the ARIN query described above identifies the organizationthat registered the IP address of interest, such organization may be areseller of IP addresses rather than the ultimate user. As such, theexample hostname resolver 212 of FIG. 2 performs a reverse domain nameserver (DNS) lookup (sometimes referred to as NSLookup (name-serverlookup)) on the IP address to identify one or more cues indicative oftrue ownership. For example, an ARIN database query on the IP address64.236.16.20 identifies “AOL Transit Data Network” as the organizationthat has registered IP addresses ranging from 64.236.0.0 to64.236.255.255. However, the ultimate party and/or end-user for the IPaddress 64.236.16.20 is more accurately determined based on a reverseDNS lookup (also referred to as a reverse resolve) performed by theexample hostname resolver 212, which reveals a hostname of“www2.cnn.com.” Accordingly, the example hostname resolver 212 parsesthe hostname, extracts “cnn” and confirms the true end-user as CNN if alabel-match is found in the example market information database 106.

To determine a location associated with the IP address of interest, theexample hostname resolver 212 performs a multi-location trace-route.Generally speaking, a trace-route is a computer network application toidentify the routers traversed by packets in an IP network. Thetrace-route application identifies multiple hops of routers, startingwith the router closest to the requestor and ending with the last routerclosest to the target IP address. Router identification during each hopincludes, in part, a name for each router. Router hops may be influencedby, in part, one or more firewalls that cause alternate hopdestinations. Typically, the last router listed is a series of routerhops geographically closest to the device associated with the IP addressof interest. Depending on the location of the requestor that isperforming a trace-route test, one or more router hops may be differentwhen compared to a requestor in a separate location. In both cases, thelast router hop is usually the same, but in some instances the lastrouter hop may identify a different router. Such alternate paths mayoccur in the example event where one or more router path(s) are down dueto, for example, hardware failure(s), power failure(s), and/or localizedweather interruptions.

To improve the confidence level that the last router hop is correctduring the trace-route, the example hostname resolver 212 performs amulti-location trace-route, in which the originating trace-route testuses the same IP address of interest, but at separate originatinglocations. For example, a first trace-route for IP address 24.32.38.55may originate in San Francisco, California, a second trace-route for thesame IP address may originate in Dallas, Texas, and a third trace-routefor the same IP address may originate in Lindbergh, Virginia. While allthree of these originating locations will include one or more initialunique router hops, the ultimate path of test packets is expected toreach the same last hop router. If all trace-route tests identify thesame last hop router information (e.g.,s208-180-36-166.snjs.ca.sta.suddenlink.net), then the example hostnameresolver 212 parses the labels of the hostname for cues indicative oflocation. Such cues (e.g., labels “snjs,” “ca,” and “suddenlink”) arecompared against those labels stored in the example market informationdatabase 106 to resolve, or otherwise translate cryptic abbreviationsand/or codes that refer to an organization (e.g., an ISP), a city (e.g.,“snjs”), a state (e.g., “ca,” “ma,” etc.), or any other geographicidentifier. On the other hand, if not all of the separate originatinglocations ultimately yield the same last hop router, then the examplehostname resolver 212 may employ a threshold test before parsing labelsfrom the hostname for location cues. In other words, a lower degree ofconfidence that the last hop router is the actual IP address locationoccurs when fewer than all separate originating locations yield the samelast hop router name.

After identifying whether the IP address of interest is active orinactive, and after identifying a corresponding IP address owner (e.g.,a true end-user rather than just the organization named by theregistry), and after identifying a corresponding location of the IPaddress of interest, the methods and apparatus described herein savesuch information to the example market information database 106 (e.g.,as tabular information). Typically, one or more entities interested inmeasuring broadband market statistics prefer to obtain a sufficientamount of sample measurements before any calculated results will bedeemed statistically significant. As such, the methods and apparatusdescribed herein may test any number of IP addresses of interest beforecalculating a corresponding market share of active IP addresses for anygiven time period (e.g., calculated market share per day, per week, perbi-week, per month, etc.). For example, if the example market shareevaluator 102 tests five-hundred IP addresses associated with a firstISP and five-hundred IP addresses associated with a second ISP, then theexample test manager 202 calculates a corresponding percentage of activeIP addresses for each of the first and second ISP. Such calculatedpercentages may occur, for example, on a weekly basis to determine aprojected broadband market presence per geographic area and/or identifyone or more trends of market share penetration for competing ISPs.

IP addresses previously identified as active and/or associated with anorganization are saved to the example market information database 106for subsequent testing at a later date/time. For example, subscribers toISP services may leave the ISP for a competitor's ISP, thereby causingthe previously active IP address to become inactive/dormant. Subsequenttesting of that same IP address may reveal that the IP address remainsdormant/inactive for a period of time, or is reallocated to anothersubscriber. Additionally, performing one or more subsequent tests on thepreviously identified IP address(es) may prevent and/or minimize theneed for additional queries to the registry database(s), in which eachquery may be associated with an access fee.

The methods and apparatus described herein also combine two or more datasources to, in part, obtain a sufficient number of samples for makingprojections to one or more larger populations and/or to utilize one ormore data sources that, standing alone, may not include a sample sizelarge enough or representative enough to allow for statisticallysignificant calculation(s). As described in further detail in connectionwith FIGS. 11-13, the example data source locator 218 identifies one ormore data sources stored and/or otherwise identified by an exampledatabase information pool 1060 of FIG. 10. For example, as new managedand/or unmanaged (convenience) data sources are identified and/orlearned by a user, such data sources may be stored in the exampledatabase information pool 1060 along with characteristics thereof. Datasource characteristics may include, but are not limited to informationrelated to the sample size of the data source, date(s) for which sampleswere obtained, geographies in which the samples were obtained, and/or acorresponding weighting factor of the data source(s). In the event thatone or more data sources identified in the database information pool1060 do not include a corresponding weighted value, which is typicallyindicative of a geographic sample scope, the example data source weightcalculator 220 performs one or more weight calculations to generate aweighted value based on the corresponding geographic scope of theidentified data source. Additionally, the example data source combiner222 combines one or more data sources to increase a representativesample size, which allows for one or more projections and/or marketstatistic calculations for larger populated areas of interest.

While an example system 100 to measure broadband market statistics andan example market share evaluator 102 has been illustrated in FIGS. 1and 2, one or more of the interfaces, data structures, elements,processes and/or devices illustrated in FIGS. 1 and 2 may be combined,divided, re-arranged, omitted, eliminated and/or implemented in anyother way. Further, the example market share evaluator 102, the exampletest manager 202, the example IP address aggregator 204, the exampleping manager 206, the example port scan manager 208, the exampleregistry manager 210, the example hostname resolver 212, the examplenetwork interface 214, the example panelist updater 216, the exampledata source locator 218, the data source weight calculator 220, and/orthe data source combiner 222 of FIGS. 1, 2, and 10 may be implemented byhardware, software, and/or firmware. Thus, for example, any of theexample market share evaluator 102, the example test manager 202, theexample IP address aggregator 204, the example ping manager 206, theexample port scan manager 208, the example registry manager 210, theexample hostname resolver 212, the example network interface 214, theexample panelist updater 216, the example data source locator 218, thedata source weight calculator 220, and/or the data source combiner 222may be implemented by one or more circuit(s) (ASIC(s)), programmablelogic device(s) (PLD(s)), and/or field programmable logic device(s)(FPLD(s)), etc. When any of the appended claims are read to cover apurely software and/or firmware implementation, at least one of theexample market share evaluator 102, the example test manager 202, theexample IP address aggregator 204, the example ping manager 206, theexample port scan manager 208, the example registry manager 210, theexample hostname resolver 212, the example network interface 214, theexample panelist updater 216, the example data source locator 218, thedata source weight calculator 220, and/or the data source combiner 222are hereby expressly defined to include a tangible medium, such as amemory, a digital versatile disc (DVD), a compact disc (CD), etc.storing the firmware and/or software. Further still, a communicationsystem may include interfaces, data structures, elements, processes,and/or devices instead of, or in addition to, those illustrated in FIGS.1, 2, and 10 and/or may include more than one of any or all of theillustrated interfaces, data structures, elements, processes and/ordevices.

FIGS. 3-8 and 11-13 illustrate example processes that may be performedto implement the example market share evaluator 102 of FIGS. 1, 2, and10. The example processes of FIGS. 3-8 and 11-13 may be carried out by aprocessor, a controller, and/or any other suitable processing device.For instance, the example processes of FIGS. 3-8 and 11-13 may beembodied in coded instructions stored on any tangible computer-readablemedium such as a flash memory, a CD, a DVD, a floppy disk, a read-onlymemory (ROM), a random-access memory (RAM), a programmable ROM (PROM),an electronically-programmable ROM (EPROM), and/or anelectronically-erasable PROM (EEPROM), an optical storage disk, anoptical storage device, magnetic storage disk, a magnetic storagedevice, and/or any other medium that can be used to carry or storeprogram code and/or instructions in the form of machine-readableinstructions or data structures, and that can be accessed by aprocessor, a general-purpose or special-purpose computer, or othermachine with a processor (e.g., the example processor platform P100discussed below in connection with FIG. 14). Combinations of the aboveare also included within the scope of computer-readable media.Machine-readable instructions comprise, for example, instructions and/ordata that cause a processor, a general-purpose computer, aspecial-purpose computer, or a special-purpose processing machine toimplement one or more particular processes. Alternatively, some or allof the example processes of FIGS. 3-8 and 11-13 may be implemented usingany combination(s) of ASIC(s), PLD(s), FPLD(s), discrete logic,hardware, firmware, etc. Also, one or more operations of the exampleprocesses of FIGS. 3-8 and 11-13 may instead be implemented manually oras any combination of any of the foregoing techniques, for example, anycombination of firmware, software, discrete logic, and/or hardware.Further, many other methods of implementing the example operations ofFIGS. 3-8 and 11-13 may be employed. For example, the order of executionof the blocks may be changed, and/or one or more of the blocks describedmay be changed, eliminated, sub-divided, or combined. Additionally, anyor all of the example processes of FIGS. 3-8 and 11-13 may be carriedout sequentially and/or carried out in parallel by, for example,separate processing threads, processors, devices, discrete logic,circuits, etc.

The example process 300 of FIG. 3 begins with the example test manager202 selecting a candidate IP address to test (block 302). Candidate IPaddresses may be retrieved from the example market information database106 based on the IP address(es) associated with known panelists, and/orcandidate IP address(es) may be acquired from the example registrydatabase 104 in response to a query associated with an ISP of interest.For example, the example test manager 202 may query the registrydatabase 104 (e.g., the ARIN database) using an organization name (e.g.,Verizon) to determine whether the organization has registered one ormore IP addresses and/or blocks of IP addresses (block 302).

Turning briefly to the illustrated example process 400 of FIG. 4, theexample test manager 202 may invoke the example panelist updater 216 toobtain panelist IP addresses to test and/or re-test. Generally speaking,because many ISPs assign IP addresses to subscribers in a dynamic mannerrather than statically, such assigned IP addresses are subject toperiodic expiration and reassignment. Accordingly, the one or morepanelists may agree to disclose their current IP address, and/orotherwise allow home monitoring equipment to operate within the panelisthousehold that periodically, manually, aperiodically, and/or on ascheduled basis report the current IP address. Without limitation, thehome monitoring equipment may invoke an application (e.g., ipconfig)and/or script on one or more household personal computers to determinewhich IP address has been assigned to the panelist household.

In the illustrated example of FIG. 4, the example panelist updater 216selects a panelist from the market information database 106 (block 402)and obtains the current IP address associated with that panelist (block404). Additionally, the example panelist updater 216 may obtain, viasurvey and/or household monitoring device(s) (e.g., Nielsen® PeopleMeter) other panelist ISP information (block 406). Panelist ISPinformation may include, but is not limited to, the ISP business name,the type of ISP services provided to the panelist household (e.g., DSLInternet services, cable Internet services, fiber optic services (FiOS),etc.). Without limitation, the example panelist updater 216 may obtainpanelist demographic information from the example market informationdatabase 106, surveys, and/or the household monitoring device(s) (block408). In the event that new and/or additional information is obtainedvia one or more surveys and/or via the household monitoring device(s)associated with, for example, Nielsen® online audience measurementservices, such information is saved to the market information database106 for future use (block 410). For example, one or more iterationsthrough the illustrated example process 400 of FIG. 4 gathers additionalcandidate IP addresses for testing, as described in further detailbelow. IP addresses obtained through the one or more iterations may besaved to a memory and/or database for individual testing, whichdetermines, in part, an active or inactive status of each candidate IPaddress. Accordingly, the example panelist updater 216 and/or the IPaddress aggregator 204 determines whether there are additional paneliststo check (block 412) so that, in part, the example test manager 202 hasa sufficient and/or statistically significant number of candidate IPaddresses to make one or more market statistic conclusions having arequisite degree of confidence.

While the illustrated example process 400 of FIG. 4 obtains one or morecandidate IP addresses for testing from panelists, the illustratedexample process 500 of FIG. 5 also obtains candidate IP addresses fortesting by determining whether the IP address registrant also managesone or more blocks of neighboring IP addresses. The example panelistupdater 216 obtains a seed IP address from a panelist by querying thelist of panelists stored in the example market information database 106(block 502). Using the seed IP address as input for a query to theregistry database 104, such as the ARIN database, the example registrymanager 210 obtains information to indicate whether the seed IP addresshas one or more neighboring IP addresses also managed and/or registeredby the same organizational entity (block 504). For example, a query tothe ARIN database using a panelist seed IP address of 69.47.21.141reveals, in part, an organization name “WideOpenWest Finance LLC,” whichhas registered IP addresses ranging from 69.47.0.0 through69.47.255.255. Based on this discovered range of IP addresses registeredby the same organization, the example test manager 202 selects a randomnumber of IP addresses to test (block 506). The selected random IPaddresses may be temporarily saved by the test manager 202 in a queue ormemory, and the information related to the known range of registered IPaddresses for the organization are saved in the example marketinformation database 106 (block 508) for future use. In other words,rather than perform another query to one or more registry databases 104to determine a range of IP addresses registered by “WideOpenWest FinanceLLC,” the test manager 202 may, instead, select another group of randomIP addresses to test from the market information database 106. Ifanother seed IP address is to be selected (block 510) to, for example,acquire a greater number of IP addresses to satisfy a statisticallysignificant threshold, control returns to block 502. Additionally, aftersaving the results to the example market information database 106 (block508), a subsequent attempt to obtain an IP address to test by the marketshare evaluator 102 may not require any further queries to one or moreregistry databases 104. Instead, the example test manager 202 mayacquire candidate IP addresses directly from the market informationdatabase 106, which are identified as being associated with a specificISP.

Returning to the illustrated example process 300 of FIG. 3, after anynumber of IP addresses are selected (block 302), the example pingmanager 206 performs a ping operation using one of the selectedcandidate IP addresses (block 304) to determine whether echo responsemessages are received and, if so (block 306) the selected candidate IPaddress is deemed active (block 308). In other words, the example pingmanager 206 determines an activity status of the candidate IP address ofinterest. On the other hand, if the ping operation does not successfullyreturn one or more echo response messages (block 306), the example testmanager 202 directs the port scan manager 208 to initiate a port scan(block 310) and, if successful (block 312), the selected IP address isdeemed active (block 308). However, if the port scan is not successful(block 312), the selected IP address is deemed inactive (block 314) andcontrol returns to block 304 to select another candidate IP address totest.

IP addresses that are deemed active (block 308) by either a ping or portscan are further used as input for a query to one or more registrydatabases 104 (block 316), such as the ARIN database. Informationretrieved from the registry databases 104 may include, but is notlimited to, the name and/or address of the organization that registeredthe selected IP address, whether the named organization has alsoregistered other IP addresses and/or one or more blocks of IP addresses,and/or a contact name, telephone number, and/or e-mail address of acontact responsible for the IP address. IP addresses that are associatedwith corporations, hospitals, schools, universities, and/or similarbusinesses and/or organizations are distinguished from IP addressesassociated with residential use by the example market share evaluator102 (block 318).

In particular, the example registry manager 210 compares the returnedorganization name with identified organization names stored in themarket information database 106. The example market information database106 includes at least one parameter associated with organization namesto identify whether the organization is residential, meaning it providesIP addresses for residential purposes, or whether the organization isnon-residential, meaning that it provides and/or otherwise manages IPaddresses for corporate, scholastic, and/or any other non-residentialpurpose (block 318). If the IP address is not residential (block 320),then the example test manager 202 determines whether there areadditional IP addresses of interest to test (block 322) and, if so,control returns to block 302. Otherwise, if the example test manager 202determines that there are no additional IP addresses of interest totest, market share statistics are calculated based on the acquired testresults (block 324), as described in further detail below.

In the event that the IP address is deemed residential (block 320), theexample market share evaluator 102 determines a corresponding locationof the IP address and performs a secondary test to identify a true enduser (block 326) as compared to the registrant identified by one or moreregistry databases (e.g., the organization associated with OrgName bythe ARIN database). As described above, the organization identified asresponsible for a registered IP address may not be the ultimateend-user, but rather a reseller of IP addresses.

FIG. 6 illustrates the example process 326 to determine a correspondinglocation of the IP address and perform the secondary test to identifythe true end user. In the illustrated example of FIG. 6, the exampletest manager 202 determines if the IP address of interest is stored inthe example market information database 106 (block 602). If so, then theexample market share evaluator 102 may not need to expend additionalprocessing resources and/or time to determine true ownership andlocation information in the event that the stored information isrelatively recent. As such, the example test manager 202 determineswhether the information associated with the IP address of interestexceeds a date and/or time threshold, which serves as an indication ofhow recent or current the stored information is. If the threshold is notexceeded (block 604), thereby indicating that the IP address informationstored in the example market information database 106 is relativelyrecent or current, such stored information is relied upon to confirm thetrue owner and location of the IP address of interest (block 606).

On the other hand, in the event that the IP address of interest is notstored in the example market information database 106 (block 602) or thetime/date threshold is exceeded (block 604), then the example hostnameresolver 212 performs a reverse DNS lookup (also referred to as areverse-resolve) using the IP address of interest to retrieve anassociated hostname (block 608). An example response to a reverse DNSlookup using the IP address of interest 68.87.148.146 is“ge-0-1-ubr01.warren.ma.boston.comcast.net.” Also note that an exampleresponse to an ARIN database Whois query using this same IP address ofinterest to identify the organization name is “Comcast CableCommunications, Inc.” Accordingly, both the reverse DNS lookup and theregistry database lookup identify the same end-user entity.

However, in some circumstances the organization name from the ARIN queryand the reverse DNS will not match, such as when the example IP addressof interest is 64.236.16.20. In this example, the ARIN query identifiesthe organization responsible for registering the IP address as “AOLTransit Data Network.” On the other hand, a reverse DNS using the sameIP address of interest reveals “www2.cnn.com.” To resolve such adisparity and determine the true end user associated with the IP addressof interest, the example hostname resolver 212 compares reverse DNSlabels with labels stored in the example market information database 106(block 610), as described in further detail in connection with FIG. 7.

In the illustrated example of FIG. 7, the example hostname resolver 212parses and places received hostname, domain name, and/or router labelsin separate fields (block 702) and identifies whether the labels satisfya match of known residential ISPs stored as resolution keywords in theresolution keyword table 107 of the example market information database106 (block 704). The one or more keywords in the example resolutionkeyword table 107 may include, but are not limited to, letters, words,numbers and/or one or more alphanumeric representations that areindicative of an ISP, an organization, and/or a location. In otherwords, the example hostname resolver 212 applies one or more rules toidentify a valid ISP associated with a combination of labels parsed froma received hostname. Continuing with the example above, the hostname“ge-0-1-ubr01.warren.ma.boston.comcast.net” may be analyzed by thehostname resolver 212 so that the labels “ma,” “boston,” and “comcast”are parsed-out for comparison purposes. In this example, the hostnameresolver 212 includes a rule in which these three labels, when allidentified within a single received hostname, result in identificationof Comcast® as the ultimate end-user ISP.

However, assuming that the returned hostname was, instead,“ge-0-1-ubr01.warren.ma.comcast.net,” then the example rule thatrequires at least three specific labels for proper identification ofComcast® would not be satisfied. Despite the fact that one of the labelsincludes “comcast,” the possibility exists that Comcast® redistributes(e.g., a reseller) the IP address to a separate ISP. Accordingly,Comcast® and/or its residential customers would not be the ultimateend-user in this example, thus the IP address of interest cannot beproperly associated with a conclusive end-user. Continuing with theexample process 610 of FIG. 7, the example hostname resolver 212determines if any label matches are found in the market informationdatabase 106 (block 706) and, if so, whether the required number ofthreshold label matches have been detected (block 708). If the requirednumber of threshold label matches have been met (block 708), then the IPaddress of interest is associated with the ISP identified in the marketinformation database 106 and/or a candidate result (block 710).

Returning to FIG. 6, if the hostname ownership is not resolved after oneor more queries to the market information database 106 as described inconnection with FIG. 7 (block 612), then the received hostname from thereverse DNS is flagged for manual follow-up (block 614). The manualfollow-up may include, but is not limited to one or more usersinvestigating new, alternate, and/or otherwise unrecognized labelkeywords to determine an associated ISP organization name and/orcorresponding location. Such newly discovered information may further beappended to the example market information database 106 and/orresolution keywords 107 so that future/subsequent circumstances allowimmediate resolution when those label keywords are encountered. Thiscircumstance may arise in the event a new ISP enters the market havinghostname labels not previously identified during one or more tests of IPaddresses. As a result, the market information database 106 describedherein continues to become more complete the longer it is used with themethods and apparatus described herein to measure broadband marketstatistics.

If, on the other hand, the received hostname is resolved (block 612),then the example hostname resolver 212 initiates a multi-locationtrace-route (block 616) to confirm a common last hop router hostname(block 618). For example, if a first trace-route for the IP address ofinterest 68.87.148.146 originates in San Francisco (e.g., initiatingrouter hostname “bdrrtr-a.22.c4.sf2.telephia.com”), a second trace-routefor the same IP address originates in Dallas (e.g., initiating routerhostname “tbr1.dlstx.ip.att.net”), and a third trace-route for the sameIP address of interest originates in Lindbergh (e.g., initiating routerhostname “host-criterion-83-225.customer.ntelos.net”) all identify alast hop router hostname “ge-0-1-ubr01.warren.ma.boston.comcast.net,”then the labels indicative of location are extracted to identify thecorresponding city and/or state in which the IP address of interest islocated (block 620). In this example, the label cues “boston” and “ma”identify the city of Boston in the state of Massachusetts. However, inthe event that there is no indicator of location in the hostname, thenany information in the example market information database 106 will beused.

If, instead, all three trace-routes do not identify a common last hoprouter hostname (block 618), then the example hostname resolver 212compares each returned hostname for a match in the example marketinformation database 106 and resolves a corresponding location if amatch is found (block 622), otherwise the IP address of interest isflagged for follow-up (block 624).

Returning to the illustrated example process 300 of FIG. 3, if the testmanager 202 does not have any additional IP addresses of interest totest (block 322), then the test manager 202 calculates one or morebroadband market statistics (block 324) such as, for example, marketshare. The illustrated example process 300 of FIG. 3 also includes anexample sub-process 328 (dashed-line) that may be performed by a callingprocess, as described in further detail below in connection with FIG.11. Turning to FIG. 8, the example test manager 202 selects an ISP ofinterest for which broadband market statistics are desired (block 802).Data returned from the one or more iterative tests of IP addresses isstored in the market information database 106, and for each ISP ofinterest the number of detected active IP addresses is divided by thetotal number of IP addresses tested to yield a percentage of active IPaddresses for the given time period (block 804). In the event that thetotal number of IP addresses tested satisfies a threshold valueindicative of a statistically significant sample size, the example testmanager 202 calculates one or more projection estimates for the ISP ofinterest in the given market area (e.g., the first market area 118, thesecond market area 120, etc.) (block 806). For example, if a number ofassigned IP addresses is found to be 325 out of 500 IP addresses tested,then a factor of 0.65 is calculated as factor indicative of active IPaddresses. Additionally, if the ISP of interest is believed to control aquantity of, for example, 10,000 IP addresses (e.g., as determined fromone or more registry databases, such as the ARIN database), then a totalnumber of active IP addresses may be calculated/projected as 6,500(e.g., 325±0.05, or 0.65×10,000). If the user desires to calculate oneor more broadband market statistics for alternate ISPs (block 808), thencontrol returns to block 802, otherwise one or more output tables may begenerated for the user (block 810), such as the example output tables900 and/or 950 shown in FIGS. 9A and 9B.

The example data output table 900 of FIG. 9A includes an IP addresscolumn 902, a reverse DNS lookup column 904 to identify a hostnamecorresponding to the IP address, a Whois column 906 to identify acorresponding registrant name associated with the IP address, a pingcolumn 908 to identify whether or not a ping operation was successful,and a port column 910 to identify whether or not one or more portscanning operations were successful. Additionally, the example table 900of FIG. 9A includes a geography column 912 to identify the location ofthe IP address, and an active column 914 to identify whether the IPaddress of column 902 is active or inactive.

The example data output table 950 of FIG. 9B includes an ISP column 952,an IP address range column 954 to identify which groups of IP addresses(e.g., one or more grouped ranges of IP addresses) the ISP isresponsible for, a number of IP addresses tested column 956 to identifythe total number of IP addresses tested from the pool of IP addresses incolumn 954, and a number of active IP addresses column 958. The exampledata output table 950 also includes a number of inactive IP addressescolumn 960 and a corresponding percentage of active IP addresses basedon the total number of IP addresses tested column 962. While FIGS. 9Aand 9B illustrate two example data output tables that may be generatedand/or calculated by the example market share evaluator 102, the methodsand apparatus described herein are not limited thereto. Any other dataoutput arrangement may be generated by the market share evaluator 102 toaccommodate for varying needs of the user(s) such as, but not limitedto, ranked data tables to illustrate ISPs having the largest or smallestmarket share for a city, a state, a region, and/or data tables toillustrate a rate of change in subscribers from one time period toanother time period. Additionally, the example data tables 900 and 950calculated by the example market share evaluator 102 may also identifyone or more trends related to subscriber increases, subscriberdecreases, and/or one or more effects in response to advertising and/orpromotional efforts. For example, the ISP using the methods andapparatus described herein may establish a baseline broadband subscribershare and/or a rate of new subscribers for the ISP and/or competitors ofthe ISP prior to initiating one or more advertising and/or promotionalcampaigns. During such advertising and/or promotional campaigns, the ISPmay continue to employ the methods and apparatus described herein todetermine the effect of such advertising efforts (e.g., a threshold gainin the percentage of subscribers, an increase in the rate of newsubscribers, etc.).

FIG. 10 is an illustration of an example system 1000 to measure marketshare and calculate flow share. The methods and apparatus describedherein may measure and/or otherwise calculate market share and/or flowshare for broadband markets (e.g., broadband Internet services, fiberInternet services, etc.) and/or video markets (e.g., over the airservices, satellite services, cable services, etc.). For the sake ofbrevity, the example system 1000 of FIG. 10 includes elements similar tothose in the example system 100 of FIG. 1 and similar reference numbersare shown in FIG. 10 to represent similar elements. In the illustratedexample of FIG. 10, the system 1000 includes a United States CensusBureau database 1050, a managed panel database 1052, a conveniencedatabase 1054, and a survey pool 1056. The example US Census Bureaudatabase 1050 includes information related to US demographics for theNortheastern United States, the Midwestern United States, the WesternUnited States, and the Southern United States. As described in furtherdetail below, the example US Census Bureau database 1050 provides one ormore benchmarks when applying weighting factors to household data. Theexample managed panel database 1052 includes, in part, panelist datahaving a relatively high confidence representation of demographicgroups. For example, the managed panel database 1052 confidence allowsfor projections to a larger universe of a demographically identifiedpopulation based on one or more statistical guidelines that conform tothe Interagency Council on Statistical Policy (ICSP), which is led bythe United States Office of Management and Budget.

The example convenience database 1054 includes data and/or panelistsassociated with one or more households, but is not managed in a mannerto conform to the ICSP. For example, the convenience database 1054 mayinclude panel sample sizes of a relatively lower amount than the examplemanaged panel database 1052 and, as a result, cost significantly less tomaintain. One or more projections applied in view of the exampleconvenience database 1054 include an associated confidence level that islower than that of the example managed panel database 1052. Similarly,the example survey pool 1056 includes survey questions and correspondingsurvey answers from available participants. Such survey answers providedata at a significantly lower operating cost when compared to themanaged panel database 1052 and/or the convenience database 1054, butmay also include a relatively lower confidence level.

To calculate market share and/or flow share in view of available dataprovided by the example convenience database 1054, which provides, inpart, additional insight into household broadband and/or video providerconsumer selections, multiple data sources are employed in a hybridmanner. Available data in the one or more convenience databases mayinclude, but is not limited to, any number of indications of associatedIP address status, such as a new subscriber ISP, an unchanged ISPassociated with an IP address having a new home address, and/or aservice type (e.g., dial-up, broadband, fiber, video, etc.) For example,an overall market size and/or market penetration of Internet use isprovided by the managed panel database 1052, while information relatedto specific ISPs is provided by the example convenience database 1054 inview of particular geographic locations. To calculate flow share data inview of available data provided by the example convenience database 1054and available survey data, which allows an indication of new movers,disconnects, and/or new inwards for the service providers, multiple datasources are employed in a hybrid manner, as described in further detailbelow.

FIG. 11 illustrates an example process 1100 to calculate serviceprovider flow share using information from one or more managed databasesand one or more convenience databases, such as the example managed paneldatabase 1052 and the example convenience database 1054. While theexamples described in connection with FIG. 11 include both managed andconvenience databases, the methods and apparatus described herein mayuse only managed or only convenience databases instead. If so, samplessizes and weights associated with such databases may be adjusted tomaintain statistical integrity of one or more projections and/orcalculations. In the illustrated example of FIG. 11, the example testmanager 202 queries the managed panel database 1052 to acquire a servicepenetration value (block 1102), such as a value of Internet servicesand/or video services (e.g., Internet based video services). Thereceived Internet penetration value includes service subscribers, suchas subscribers of dial-up, broadband, video, and/or fiber-basedservices, but does not typically identify which service provider isresponsible for those services. While the example managed panel database1052 reports, for instance, an Internet penetration value associatedwith one or more demographic groups and/or one or more geographic areasbelieved to be highly representative, the management of the managedpanel database 1052 may require that some households be dropped from thepanel while new households be added in an effort to maintain therelatively high degree of representation in conformance with ICSP. Tobridge a gap in information not available to the example managed paneldatabase 1052, the example test manager 202 queries the conveniencedatabase 1054 to retrieve a valid household IP address associated with apanel member. Each household member associated with the exampleconvenience database 1054 includes an associated householdidentification number so that, in part, changes in household activitymay be tracked from one time period to another time period.Additionally, each household member associated with the exampleconvenience database 1054 includes monitoring equipment and/ormonitoring software that reports a current IP address on a periodic,aperiodic, manual, and/or scheduled basis. As a result, in the eventthat the household associated with the household identification numberreports a different IP address at a later time period, the methods andapparatus described herein may identify that the household may havechanged Internet service providers.

The household members associated with each household identificationnumber in the convenience database 1054 also provide information relatedto their geographic location (e.g., zip code), which is retrieved by theexample test manager 202 (block 1106). Additionally, the example testmanager 202 retrieves information reported by the household members toidentify whether they have dial-up, broadband, fiber, or other Internetservices (block 1108), and retrieves information reported by thehousehold members to identify which ISP provides such Internet services(block 1110). However, because the example convenience database 1054 isnot a managed database, the possibility exists that the self reportedinformation from the household members is not accurate. To verifywhether the self reported ISP information retrieved by the example testmanager 202 (block 1110) is accurate, the test manager 202 invokes anexample ISP verification process 328 (as shown by the dashed border ofFIG. 3), as described above in connection with FIGS. 3, 6, and 7.

In the illustrated example process 1100 of FIG. 11, the test manager 202determines whether the ISP information from the convenience database1054 matches the ISP information from the example ISP verificationprocess 328 (block 1112). If not, the example test manager 202determines whether to override the ISP information in the conveniencedatabase (block 1114) based on, for example, an instruction from a userof the example system 1000 and/or in circumstances when the ISPverification process 328 is not capable of identifying the ISP. If theISP information is to be overridden (block 1114), the example testmanager 202 uses the ISP information retrieved from the ISP verificationprocess 328 as the trusted ISP for a given panelist from the conveniencedatabase 1054 (block 1116), and one or more weighting adjustments areapplied to the information associated with the given panelistidentification number stored in the convenience database 1054 to reflecta confidence factor and/or reliability of information associated withthe panelist (block 1118).

However, if the ISP information matches (block 1112) or if theconvenience database 1054 ISP information is not to be overridden (block1114), or if the convenience database 1054 matches the ISP information(block 1112), then the example test manager 202 applies a weightingfactor indicative of a greater degree of confidence in the dataintegrity/accuracy of the convenience database 1054 (block 1119). Forexample, the applied weighting factor (block 1119) may include a unityweight (e.g., 1.00) in the event that the convenience database 1054 ishighly trusted (e.g., due to a relatively large number of samples, dueto a degree of consistency as compared to a managed database and/or ICSPstandards, etc.). On the other hand, the example test manager 202 mayapply a default weight to one or more convenience databases indicativeof a decreased degree of confidence by virtue of the fact that thedatabase is not managed.

The example test manager 202 continues to evaluate any remainingpanelist information and associated IP addresses in the conveniencedatabase (block 1120). In the event that there are additional paneliststo evaluate (block 1120), control returns to block 1104, otherwise theexample test manager 202 retrieves household demographic informationfrom the convenience database (block 1122), such as, but not limited toindicators of race, marital status, income, and/or education.Additionally, the example data source weight calculator 220 applies oneor more household weighting factors to reflect benchmarks associatedwith U.S. Census region data from the example U.S. Census database 1050(block 1124). The applied census weights allow, in part, a reduction ofthe bias associated with the convenience data source (e.g., database) byconfirming that the convenience database aligns in a manner with a morereliable standard source (i.e., the U.S. Census database 1050). In viewof the possibility that one or more census region(s) includesub-populations of additional and/or alternate demographic presence, theexample test manager 202 applies a stratified sampling to suchsub-populations within each census region (block 1126). Additionally,the application of the stratified sampling allows for control (e.g.,minimize) of one or more geographic biases. Projections of householdsubscriber counts are performed by the example test manager 202 based onthe stratified sampling output to yield market share for each ISP withina given geography of interest (block 1128). Based on the resolution,representation, and/or confidence level of the example conveniencedatabase 1054, the one or more market share projections may be made at adirect marketing area (DMA) level or, if a threshold confidence level ismet and/or exceeded, a city and/or township level may be projectedhaving increased granularity. The example test manager 202 may thencalculate broadband flow share (block 1130).

FIG. 12 illustrates an example process 1130 to calculate broadband flowshare using information from the one or more managed panel databases1052, convenience databases 1054, and survey data from the examplesurvey pool 1056. While the example survey pool data 1056 includes amore economic manner in which to obtain respondent/customer feedback andother information, the sample size required to project to largerpopulations becomes cost prohibitive. Additionally, because the surveypool data 1056 is not typically managed and/or verified to be within oneor more measured levels of representation, the possibility of incorrectanswers from survey respondents exists. However, survey pool data 1056includes valuable information related to broadband gross additions, newmovers, and/or new inward behaviors. The example survey pool data 1056may include, but is not limited to diary data.

An example process 1200 is shown in FIG. 12 to calculate one or moreflow share measurements that identify gross additions, new movers,and/or new inward behaviors associated with service providers. In theillustrated example of FIG. 12, the example test manager 202 retrievesone or more indications of an IP address switch from the exampleconvenience database 1054 (block 1202). To determine information relatedto prior customer/subscriber ISP information, whether thecustomer/subscriber moved to a new home and obtained a new ISP, whetherthe customer/subscriber moved to a new home and maintained the same ISP,and/or whether the customer/subscriber obtained a new type of service(e.g., dial-up to broadband), the example test manager 202 retrievessurvey responses from the example survey pool 1056 (block 1204). Theexample survey information obtained from the example survey pool 1056may include any other types of responses including, but not limited toformer ISP names, new ISP names, duration of ISP service, type of ISPservice (e.g., dial-up, broadband, cable, DSL, fiber, free Wi-Fi,wireless air card, etc.), and/or one or more service names associatedwith the ISP (e.g., AT&T UVerse , etc.).

While the received data from the example survey pool 1056 may contain asample size significantly lower than that found in a managed datasource, the example test manager 202 applies one or more demographicbenchmarks to the received survey data in a manner consistent with U.S.census data from the example U.S. Census database 1050 (block 1206).Additionally, the market share projections calculated in connection withthe example process 1100 of FIG. 11 are applied to the survey data toconstrain flow share volumes in a manner consistent with the keycompetitive geographies of a selected client/service provider (block1208). In other words, flow share data returned to a given client may betailored to be relevant to only those geographic areas in which thegiven client competes with other service providers. Within eachgeographic area of interest, the example test manager 202 uses themarket share data and benchmarked survey data to project gross adds anddeactivations for the client and the key competitors of the client basedon known geographic regions of competition (block 1210). In someexamples, a change in a provider for a panelist and/or survey respondentmay be interpreted as a deactivation and a gross add, while a subscriberthat moves to an alternate address may be interpreted as a deactivationand a gross add when that subscriber maintains the same serviceprovider. In other examples, a subscriber that moved to a new addressand is new to broadband services may be interpreted as a gross addition,while a subscriber that drops broadband services in favor of, forexample, dial-up services may be interpreted as a deactivation. Further,in other examples a broadband subscriber that previously had dial-upservice or no service may be interpreted as a gross addition. Netadditions may further be calculated as the difference between grossadditions and subscriber deactivations.

One or more verification procedures may be performed to ensure that theaforementioned use of survey data includes a threshold level ofconfidence, thereby increasing the marketable value of the projectedflow share data (e.g., gross adds, deactivations, new inroads, newmovers, etc.). In the illustrated example process 1200 of FIG. 12, thetest manager 202 calculates a difference between gross additions anddisconnects to yield a change in the number of subscribers for a giventime period (block 1212). Such a calculation is derived from informationreceived from survey pool data. However, information received from theconvenience database 1054 may be used by the example test manager 202 tocalculate a difference between a client starting and ending subscribercount value to determine a change in the number of subscribers for agiven time period (block 1214). In the event that the difference betweenthe calculated number of subscribers for the given time period (blocks1212 and 1214) does not exceed a threshold value (block 1216), then theprojected flow share results are provided in a report for each client(block 1218). However, in the event that the example threshold value isexceeded (block 1216), then the projected flow share results may beaveraged, normalized, and/or scaled (block 1220) to better representexpected results for the given client of interest and/or geography ofinterest.

While the aforementioned examples to calculate market share and/or flowshare identified broadband Internet services and broadband ISPs, themethods and apparatus described herein are not limited thereto. Forexample, the methods and apparatus described herein facilitatecalculating market share and/or flow share for media providers, such asproviders of satellite, cable, and/or over-the-air services. Similar tothe ISPs described above, a stand-alone survey for media providers willnot typically allow a cost effective approach to gathering anappropriate sample size to meet statistically significant requirementsfor a target geography. Although one or more managed data sources, suchas the example managed panel database 1052 of FIG. 1, include panelistdata having a relatively high degree of geographic and demographicrepresentation (e.g., one or more managed databases that satisfystandards promulgated by the ICSP), such managed data sources may stillhave an insufficient sample size to allow projection(s) to a largerpopulation. While adding highly managed panelists to a managed databasemay allow for a sufficient sample size for purposes of projection, costrestraints may prohibit such efforts.

FIG. 13 illustrates an example process 1300 to calculate market shareand/or calculate flow share using information from multiple datasources. At least one benefit of the methods and apparatus describedherein to calculate market share and/or flow share is that one or moredata sources may be combined that, standing alone, would otherwise failto allow for such calculations having accepted accuracy standards. Inthe illustrated example of FIG. 13, the example data source locator 218identifies two or more candidate data sources to reach a thresholdnumber of samples (block 1302). The threshold number of samples maydiffer based on the target geography for which projections are to bemade. Data sources may include, but are not limited to managed datasources and/or convenience data sources. If the example data sourcelocator 218 does not identify any candidate managed data sources from,for example, the database information pool 1060, then a correspondingrequisite (threshold) number of samples may be increased to account forpotential decreases in representation confidence levels of theconvenience data source(s).

Each identified managed data source includes an associated weight basedon, in part, a representative geography from which the samples werederived. Additionally or alternatively, a corresponding weight may becalculated for each identified data source. For example, a managed datasource, such as the Nielsen® PeopleMeter® operated and managed by TheNielsen Company® collects samples on a national level and has ageographic weight associated with its sample diversity. However, otherdata sources having a derived group of samples from a localizedgeography (e.g., regional, DMA-level, city, township, etc.) include acorresponding weight that differs from the national level. As a result,merely combining the two example data sources without adjustments totheir corresponding weights would not satisfy statistical standards thatcomply with, for example, the ICSP. The example data source weightcalculator 220 retrieves and/or calculates the corresponding weight foreach managed data source (block 1304) and determines whether each datasource was derived from a dissimilar geographic scope (block 1306). Ifnot, then the data sources may be combined by the example data sourcecombiner 222 without re-weighting the combined samples (block 1308),otherwise the example data source combiner 222 combines samples fromeach data source and the data source weight calculator 220 re-calculatesthe combined samples to yield a combined weighting value and/ornormalized weighting value (block 1310). If there are additionalcandidate data sources to combine (block 1312), control returns to block1306.

To provide projections in a manner that is meaningful to a client, suchas a provider of Internet services or a provider of video media, theexample data source weight calculator 220 derives a benchmark weightbased on the managed data source(s) for each client and theircorresponding geographic area (block 1314). For example, while anational managed data source includes information related to videoservice providers across the nation, such data may not be representativeof a specific geography in which the video provider has a presence(e.g., San Francisco). As such, deriving and/or calculating thebenchmark weight based on a scope of the managed data source(s) allowsthe service provider(s) to calibrate projections in a manner moreconsistent with the limited geography and/or demographics in which theyoperate.

The example data source weight calculator 220 also retrieves and/orcalculates a weight for each convenience data source (block 1316) anddetermines whether the identified data source(s) have a dissimilargeographic scope (block 1318). If not, then the example data sourcecombiner 222 combines the data sources (block 1320), otherwise the datasources are combined and re-weighted by the data source weightcalculator 220 to maintain statistical integrity for later calculationsand/or projections (block 1322). If there are additional conveniencedata sources to combine (block 1324), control returns to block 1318,otherwise the convenience data sources are adjusted and/or calibratedfor timing differences with respect to the managed data sources (block1326). Timing difference may occur in the event that the conveniencedata source data temporally lags behind data obtained by the manageddata sources. Such timing lags may occur when the convenience datasources are acquired after a relatively longer period of time, such as aweek or two weeks in the case of diary data sources. All of the manageddata sources and the convenience data sources are combined andre-weighted in the aggregate (block 1328) before calculating marketshare (block 1330) and/or calculating flow share in a manner that isbounded by the market share calculation(s) (block 1332).

FIG. 14 is a schematic diagram of an example processor platform P100that may be used and/or programmed to implement any or all of theexample market share evaluator 102, the example test manager 202, theexample IP address aggregator 204, the example ping manager 206, theexample port scan manager 208, the example registry manager 210, theexample hostname resolver 212, the example network interface 214, and/orthe example panelist updater 216 of FIGS. 1, 2, and 10. For example, theprocessor platform P100 can be implemented by one or moregeneral-purpose processors, processor cores, microcontrollers, etc.

The processor platform P100 of the example of FIG. 14 includes at leastone general-purpose programmable processor P105. The processor P105executes coded instructions P110 and/or P112 present in main memory ofthe processor P105 (e.g., within a RAM P115 and/or a ROM P120). Theprocessor P105 may be any type of processing unit, such as a processorcore, a processor and/or a microcontroller. The processor P105 mayexecute, among other things, the example processes of FIGS. 3-8 and11-13 to implement the example methods and apparatus described herein.

The processor P105 is in communication with the main memory (including aROM P120 and/or the RAM P115) via a bus P125. The RAM P115 may beimplemented by dynamic random access memory (DRAM), synchronous dynamicrandom access memory (SDRAM), and/or any other type of RAM device, andROM may be implemented by flash memory and/or any other desired type ofmemory device. Access to the memory P115 and the memory P120 may becontrolled by a memory controller (not shown). The example memory P115may be used to implement the example market information database 106 ofFIGS. 1 and 10.

The processor platform P100 also includes an interface circuit P130. Theinterface circuit P130 may be implemented by any type of interfacestandard, such as an external memory interface, serial port,general-purpose input/output, etc. One or more input devices P135 andone or more output devices P140 are connected to the interface circuitP130.

Although certain example methods, apparatus and articles of manufacturehave been described herein, the scope of coverage of this patent is notlimited thereto. On the contrary, this patent covers all methods,apparatus and articles of manufacture fairly falling within the scope ofthe appended claims either literally or under the doctrine ofequivalents.

1-26. (canceled)
 27. A system to correct data source error, comprising:means for resolving to: identify a first hostname label based on areverse resolve of an Internet protocol (IP) address corresponding to ahousehold; parse the first hostname label to identify a service providername; invoke a trace route of the IP address corresponding to thehousehold, the trace route originating from two or more originating IPaddresses to identify second and third hostname labels withcorresponding router last hops, the two or more originating IP addressesdifferent from each other and targeting the same IP addresscorresponding to the household; and when the second and third hostnamelabels include a matching location indicator, confirm the serviceprovider name is valid; and means for calculating to, when the firsthostname label service provider name matches a service provider name inself reported data, correct an error metric of a data source storing theself reported data by adjusting a weight value associated with the selfreported data.
 28. The system as defined in claim 27, wherein thematching location indicator includes at least one of a city name or astate name.
 29. The system as defined in claim 27, wherein thecalculating means is to increase the weight value in response toidentifying a match between (a) the first hostname label serviceprovider name from the reverse resolve and (b) the service provider namein the self reported data.
 30. The system as defined in claim 27,wherein the calculating means is to decrease the weight value inresponse to identifying a dissimilarity between (a) the first hostnamelabel service provider name from the reverse resolve and (b) the serviceprovider name in the self reported data.
 31. The system as defined inclaim 27, wherein the calculating means is to calculate a serviceprovider flow share based on the adjusted weight value and a servicepenetration value from a managed data source.
 32. The system as definedin claim 31, wherein the service penetration value is at least one of avalue of Internet services or a value of video services.
 33. The systemas defined in claim 27, wherein the second and third hostname labelsinclude two or more matching location indicators.
 34. The system asdefined in claim 27, further including means for managing to determinewhether the IP address from the household is active or inactive based ona ping operation.
 35. The system as defined in claim 34, wherein themanaging means is to: transmit an echo request packet to the IP address;and in response to receiving a reply message from the IP address,determine that the IP address is active.
 36. The system as defined inclaim 35, further including means for port scanning to, when the echorequest packet is blocked, scan one or more ports of a machineassociated with the IP address to determine the IP address is active.37. The system as defined in claim 27, wherein the resolving means is toparse the first hostname label into two or more parsed labels toidentify the service provider name.
 38. The system as defined in claim27, wherein the resolving means is to identify a match of known Internetservice providers stored as resolution keywords.
 39. The system asdefined in claim 27, wherein the resolving means is to: parse the firsthostname label into parsed labels; and satisfy a rule to identify theservice provider name.
 40. The system as defined in claim 27, whereinthe calculating means is to select the IP address corresponding to thehousehold from a market information database.
 41. The system as definedin claim 40, wherein the IP address corresponds to a panelist, furtherincluding means for updating to determine demographic informationassociated with the panelist.